Comparison of Tied-Mixture and State-Clustered HMMs with Respect to Recognition Performance and Training Method
نویسندگان
چکیده
Tied-mixture HMMs have been proposed as the acoustic model for large-vocabulary continuous speech recognition and have yielded promising results. They share base-distribution and provide more flexibility in choosing the degree of tying than state-clustered HMMs. However, it is unclear which acoustic models to superior to the other under the same training data. Moreover, LBG algorithm and EM algorithm, which are the usual training methods for HMMs, have not been compared. Therefore in this paper, the recognition performance of the respective HMMs and the respective training methods are compared under the same condition. It was found that the number of parameters and the word error rate for both HMMs are equivalent when the number of codebooks is sufficiently large. It was also found that training method using the LBG algorithm achieves a 90% reduction in training time compared to training method using the EM algorithm, without degradation of recognition accuracy. Comparison of TiedMixture and State-Clustered HMMs with Respect to Recognition Performance and Training Method
منابع مشابه
Hybrid Training Method for Tied Mixture Density Hidden Markov Models Using Learning Vector Quantization and Viterbi Estimation
In this work the output density functions of hidden Markov models are phoneme-wise tied mixture Gaussians. For training these tied mixture density HMMs, modiied versions of the Viterbi training and LVQ based corrective tuning are described. The initialization of the mean vectors of the mixture Gaussians is performed by rst composing small Self-Organizing Maps representing each phoneme and then ...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملMultilingual speech recognition: the 1996 byblos callhome system
This paper describes the 1996 Byblos Callhome speech recognition system for Spanish and Egyptian Colloquial Arabic. The system uses a combination of Phoneticly Tied-Mixture Gaussian HMMs and State-Clustered Tied-Mixture Gaussian HMMs in a multiple pass decoder. We focus here on the aspects of the system which are language specific and demonstrate the adaptability of the Byblos English system to...
متن کاملImproved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual synchronous database
This paper presents methods to improve speech recognition accuracy by incorporating automatic lip reading. The paper improves lip reading accu racy by following approaches; 1)collection of im age and speech synchronous data of 5240 words, 2)feature extraction of 2・dimensional power spect日 around a mouth and 3)sub-word unit HMMs with tied-mixture distribution(Tied-Mixture HMMs). Ex periments ...
متن کاملA new look at HMM parameter tying for large vocabulary speech recognition
Most current state-of-the-art large-vocabulary continuous speech recognition (LVCSR) systems are based on state-clustered hidden Markov models (HMMs). Typical systems use thousands of state clusters, each represented by a Gaussian mixture model with a few tens of Gaussians. In this paper, we show that models with far more parameter tying, like phonetically tied mixture (PTM) models, give better...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JITR
دوره 7 شماره
صفحات -
تاریخ انتشار 2014